An application of the Lorenz Curve to Basketball.

The Xavier Musketeers are putting on a good show in the tough big east conference this year! Part of what makes this team special to watch is the depth of the roster. From leading score Souley Boum to 3pt sniper Adam Kunkel, it seems like any person on the roster is liable to go 20. Coach Sean Miller has done a phenomenal job getting the ball to the players in situations where they have a strong advantage.

But this go me thinking about ways to measure the distribution of points within a basketball team and whether or not “sharing the rock” had any correlation to other performance metrics. In Economics their is a concept called the Gini Index which is used to measure the income inequality of a given nation. The graph that this concept is charted on is called a Lorenz Curve named after American economist Max O. Lorenz. While classically the Lorenz curve and the gini index are used to measure income inequality, I thought that I would have some fun by applying the concepts to basketball.

Basics on the Lorenz Curve.

The Lorenz curve is beautiful in its simplicity but can give us insights into how productivity is distributed. Along the Y-axis the the cumulative sum of the percentage of income each person is responsible for. Along the X-axis the cumulative sum of each person as a percentage of the total population. In plain English. we plot points on the graph based on their contribution to the population and their contribution to total income. when we connect the dots, we have made ourselves a Lorenz curve. This curve is then compared to a 45 degree line that cuts the graph diagonally with a slope of 1 to represent perfect quality.

Now, this where the mathematics of it gets funky. Im going to put a trigger warning in here cause were about to talk calculus.

if you take the area in between the perfect equality line and the Lorenz curve and divide it by the total area underneath the equality line we have a number between 0 and 1 which is called the gini Coefficient. The gini coefficient is often multiplied by a 100 to give the gini index which is number between 0 and 100. Most people just use the terms interchangeable in casual conversation.

a 1.00 gini coefficient means perfect inequality

a 0.00 gini coefficient means perfect equality.

One simple trick for remembering which one is which is that 1 means one person has everything and 0 means that no one has anymore.

Findings (there were none lmao)

While this was a fun exercise, I could find no correlations between a teams scoring gini index and any other measure of performance. All of the correlation graphs look like a sneeze :(.

I think the take away here is that teams take different strategies based on personnel which are equally effective. While some teams like Xavier have a full roster of real tough guys ready to #zip-em-up, other schools rely on one main scorer to get the job done. Check out the graphs below to see where your favorite team falls on the spectrum.

Hope you found this imformative!

Check out the full code on https://Github.com/PRAY4ENEMY

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'rvest'
## 
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

##         schdata.School schdata.gini schdata.pts
## 1           Seton Hall    0.2524670        1439
## 2           Notre Dame    0.2663559        1475
## 3            Tennessee    0.2730346        1479
## 4       Boston College    0.2749542        1388
## 5       Oklahoma State    0.2811487        1358
## 6         Georgia Tech    0.2871251        1367
## 7                 Duke    0.2922915        1427
## 8           Iowa State    0.3199851        1341
## 9             Colorado    0.3230380        1580
## 10             Florida    0.3266650        1436
## 11            Virginia    0.3266878        1259
## 12             Indiana    0.3325346        1565
## 13          Providence    0.3429003        1655
## 14         Mississippi    0.3431339        1350
## 15              Baylor    0.3443604        1587
## 16               Texas    0.3449680        1607
## 17           Marquette    0.3467986        1718
## 18         Connecticut    0.3470520        1730
## 19            Illinois    0.3479524        1514
## 20          California    0.3504241        1211
## 21            Stanford    0.3533382        1358
## 22            Oklahoma    0.3536330        1335
## 23             Georgia    0.3536877        1392
## 24     St. John's (NY)    0.3603776        1618
## 25                 TCU    0.3610392        1546
## 26          Washington    0.3670962        1524
## 27           Minnesota    0.3710660        1182
## 28       Arizona State    0.3760486        1510
## 29          Vanderbilt    0.3800716        1439
## 30          Pittsburgh    0.3815385        1560
## 31            Missouri    0.3840558        1661
## 32          Texas Tech    0.3840751        1483
## 33       West Virginia    0.3850142        1527
## 34              Oregon    0.3855798        1469
## 35          Georgetown    0.3858239        1516
## 36            Kentucky    0.3884243        1511
## 37              Purdue    0.3900130        1542
## 38             Rutgers    0.3902279        1388
## 39        Kansas State    0.3925450        1556
## 40              Auburn    0.3943597        1442
## 41            Nebraska    0.3973988        1384
## 42   Mississippi State    0.3979026        1291
## 43             Clemson    0.4016561        1559
## 44              Xavier    0.4037801        1746
## 45           Creighton    0.4058568        1546
## 46       Virginia Tech    0.4081787        1455
## 47    Washington State    0.4107110        1487
## 48             Alabama    0.4107143        1652
## 49      South Carolina    0.4139021        1266
## 50          Ohio State    0.4178000        1524
## 51      Michigan State    0.4179067        1454
## 52           Texas A&M    0.4225067        1484
## 53        Northwestern    0.4246951        1312
## 54           Villanova    0.4250354        1414
## 55                Iowa    0.4312811        1608
## 56            Maryland    0.4358213        1408
## 57       Florida State    0.4382381        1455
## 58              DePaul    0.4423880        1503
## 59             Arizona    0.4465616        1708
## 60            Arkansas    0.4481873        1498
## 61         Wake Forest    0.4489268        1618
## 62        Oregon State    0.4496503        1320
## 63              Butler    0.4502454        1494
## 64          Miami (FL)    0.4525071        1561
## 65     Louisiana State    0.4526316        1330
## 66            Syracuse    0.4531646        1580
## 67                UCLA    0.4538479        1564
## 68            NC State    0.4692400        1658
## 69          Penn State    0.4733653        1458
## 70          Louisville    0.4849333        1250
## 71            Michigan    0.4866102        1475
## 72           Wisconsin    0.4868914        1246
## 73 Southern California    0.4918889        1500
## 74              Kansas    0.4944242        1500
## 75      North Carolina    0.5031684        1657
## 76                Utah    0.5207197        1572
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'